Second version of encTEX: UTF-8 support

نویسنده

  • Petr Oľsák
چکیده

The UTF-8 encoding keeps the standard ASCII characters unchanged and encodes the accented letters of our alphabets in two bytes. The standard 8bit TEX is not ready for the UTF-8 input because it have to manage the single character as two tokens. It means you cannot set the \catcode, \uccode, etc. to these single characters and you cannot do \futurelet of the next character in normal sense. The second version of my encTEX solves these problems. The encTEX is full backward compatible with the original TEX. It adds ten new primitives by which you can set or read the conversion tables used by input processor of TEX or used during output to the terminal, log and \write files. The second version gives possibility to convert the multi-byte sequences to one byte or to control sequence. You can implement up to 256 UTF-8 codes as one byte and unlimited number of other UTF-8 codes as a control sequence. All internals in 8bit TEX are working in the same way as if “normal one byte encoding” of input files is used. I think that the UTF-8 encoding will be used more common. In such situation, there is no another way than to modify the input processor of TEX otherwise the 8bit TEX will dead in short time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Providing some UTF-8 support via inputenc

3 Mapping characters — based on font (glyph) encodings 11 3.1 About the table itself . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2 The mapping table . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Mappings for OT1 glyphs . . . . . . . . . . . . . . . . . . . . . . . 24 3.5 Mappings for OMS g...

متن کامل

Putting the Cork back in the bottle— Improving Unicode support in TEX

Until recently, all of the hyphenation patterns available for different languages in TeX were using 8-bit font encodings, and were therefore not directly usable with UTF-8 TeX engines such as XeTeX and LuaTeX. When the former was included in TeX Live in 2007, Jonathan Kew, its author, devised a temporary way to use them with XeTeX as well as the “old” TeX engines. Last spring, we undertook to c...

متن کامل

Package ‘ Rmalschains ’ August 29 , 2013

August 29, 2013 Maintainer Christoph Bergmeir License GPL-3 | file LICENSE Title Continuous Optimization using Memetic Algorithms with Local Search Chains (MA-LS-Chains) in R LinkingTo Rcpp Type Package LazyLoad yes Author Christoph Bergmeir, Daniel Molina, José M. Benítez Description This package implements an algorithm family for continuous optimization called memet...

متن کامل

Package ‘ cp 4 p ’

May 16, 2016 Type Package Title Calibration Plot for Proteomics Version 0.3.5 Date 2016-05-11 Author Quentin Giai Gianetto, Florence Combes, Claire Ramus, Christophe Bruley, Yohann Couté, Thomas Burger Maintainer Quentin Giai Gianetto Description Functions to check whether a vector of p-values respects the assumptions of FDR (false discovery rate) control procedures and to ...

متن کامل

1-0 Transformation Form of UTF-8

Based on Multilevel Mark Theory,11-10 and 1-0 transformation form of UTF-8 are proposed in this paper. The transformation between UCS and 1-0 form of UTF-8 is introduced, then, the transformation between Local Code and 1-0 Form of UTF-8 is discussed in detail.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005